feat(huggingFace): add HuggingFaceModelResource for model browsing and media proxy#5124
feat(huggingFace): add HuggingFaceModelResource for model browsing and media proxy#5124PG1204 wants to merge 11 commits into
Conversation
…d media proxy Introduces a new Jersey REST resource exposing endpoints used by the upcoming HuggingFace operator UI: - GET /api/huggingface/models — browse / search models per task - GET /api/huggingface/tasks — list HF pipeline tags with hosted inference - POST /api/huggingface/upload-audio — upload audio for HF audio tasks - GET /api/huggingface/audio-preview — stream uploaded audio (path-validated) - GET /api/huggingface/media-proxy — proxy remote media URLs to bypass CORS This is the first PR in a stacked series landing the HF operator end-to-end. No operator code yet; this resource is independently useful and lets the frontend integrate with HF before the operator class lands.
|
/request-review @Ma77Ball |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #5124 +/- ##
============================================
- Coverage 49.16% 47.85% -1.31%
- Complexity 2384 2401 +17
============================================
Files 1051 1043 -8
Lines 40350 40261 -89
Branches 4279 4302 +23
============================================
- Hits 19837 19268 -569
- Misses 19353 19834 +481
+ Partials 1160 1159 -1
*This pull request uses carry forward flags. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@PG1204 Thanks for opening this PR! Please do the following:
|
|
Thank you for the suggestions. Will update the PR accordingly. |
|
Hi @PG1204, while I begin my review, please address @Yicong-Huang's feedback. Specifically:
Thanks, and looking forward to the updates! |
Ma77Ball
left a comment
There was a problem hiding this comment.
Please review and resolve the comments and ask any questions as needed.
|
/request-review @Ma77Ball requesting re-review for the changes. |
|
/request-review @xuang7 |
Addresses xuang7's review on PR apache#5124 — both endpoints previously buffered the full payload into a heap-resident byte[] with no upper bound, leaving the JVM open to OOM on a hostile or buggy upstream response (/media-proxy) or out-of-band write into the audio temp dir (/audio-preview). - /media-proxy: switch from Unirest.asBytes() to asObject(Function<RawResponse, T>), streaming the upstream body in 8 KiB chunks with a running byte counter. Aborts with 413 if the declared Content-Length exceeds the cap (pre-check) or if the body crosses the cap mid-read (defends against missing/lying Content-Length). New MAX_MEDIA_PROXY_BYTES = 50 MiB, sized for HF inference media (text-to-image ~5 MiB, text-to-video ~30 MiB) with headroom. - /audio-preview: add Files.size() defense-in-depth check before readAllBytes. /upload-audio already enforces MAX_AUDIO_BYTES on ingest; this catches the case where a bug or out-of-band write puts an oversized file in the temp dir. Adds a spec covering the audio-preview cap using a sparse-file fixture so the test stays fast (87/87 spec passes). The media-proxy cap path is exercised via the existing input-validation suite plus the new streamMediaWithCap helper - a follow-up can add a fake-RawResponse unit test if reviewers want explicit coverage of the chunked-read cap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eration Splits the monolithic 1,278-line HuggingFaceInferenceOpDesc from the team's feature branch into a dispatcher + per-task codegen architecture and ships the first task family (text-generation) end-to-end. - TaskCodegen trait + CodegenContext model the per-task variation - PythonCodegenBase emits the shared provider-fallback / process_table / _parse_response infrastructure with two holes for the per-task payload and parse snippets - TextGenCodegen supplies text-generation's chat-completions payload and the body["choices"][0]["message"]["content"] parse branch - HuggingFaceInferenceOpDesc becomes a thin dispatcher (~180 lines) holding @JsonProperty fields and the registeredCodegens map User-input string fields are typed as EncodableString and emitted via the pyb"..." macro so values reach Python as self.decode_python_template('<base64>') rather than raw literals; class constants are assigned in open(self) so self is in scope for the decode call. Generated process_table runs a defensive _HF_MODEL_ID_PATTERN check at runtime before any HF URL is composed. PR 2 of a stacked 9-PR series. PR 1 (apache#5124) ships the supporting REST resource; PRs 3-5 will add image, audio + media-gen, and QA/ranking task families by registering new *Codegen objects in the dispatcher. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@Ma77Ball Would you prefer that I resolve the conversations or you'd rather resolve them. If any of the comments still require work, I shall work on them and update the PR. |
…eration Splits the monolithic 1,278-line HuggingFaceInferenceOpDesc from the team's feature branch into a dispatcher + per-task codegen architecture and ships the first task family (text-generation) end-to-end. - TaskCodegen trait + CodegenContext model the per-task variation - PythonCodegenBase emits the shared provider-fallback / process_table / _parse_response infrastructure with two holes for the per-task payload and parse snippets - TextGenCodegen supplies text-generation's chat-completions payload and the body["choices"][0]["message"]["content"] parse branch - HuggingFaceInferenceOpDesc becomes a thin dispatcher (~180 lines) holding @JsonProperty fields and the registeredCodegens map User-input string fields are typed as EncodableString and emitted via the pyb"..." macro so values reach Python as self.decode_python_template('<base64>') rather than raw literals; class constants are assigned in open(self) so self is in scope for the decode call. Generated process_table runs a defensive _HF_MODEL_ID_PATTERN check at runtime before any HF URL is composed. PR 2 of a stacked 9-PR series. PR 1 (apache#5124) ships the supporting REST resource; PRs 3-5 will add image, audio + media-gen, and QA/ranking task families by registering new *Codegen objects in the dispatcher. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per review on apache#5124 (xuang7, Ma77Ball): mark the resource with @RolesAllowed(Array("REGULAR", "ADMIN")) to document that all five endpoints require an authenticated user. The annotation isn't enforced yet — that's coming with the auth-enforcement PR @Yicong-Huang and @Ma77Ball are working on — but adding it now means no follow-up change is needed when enforcement lands, and it matches the convention used by UserConfigResource / AdminSettingsResource. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eration Splits the monolithic 1,278-line HuggingFaceInferenceOpDesc from the team's feature branch into a dispatcher + per-task codegen architecture and ships the first task family (text-generation) end-to-end. - TaskCodegen trait + CodegenContext model the per-task variation - PythonCodegenBase emits the shared provider-fallback / process_table / _parse_response infrastructure with two holes for the per-task payload and parse snippets - TextGenCodegen supplies text-generation's chat-completions payload and the body["choices"][0]["message"]["content"] parse branch - HuggingFaceInferenceOpDesc becomes a thin dispatcher (~180 lines) holding @JsonProperty fields and the registeredCodegens map User-input string fields are typed as EncodableString and emitted via the pyb"..." macro so values reach Python as self.decode_python_template('<base64>') rather than raw literals; class constants are assigned in open(self) so self is in scope for the decode call. Generated process_table runs a defensive _HF_MODEL_ID_PATTERN check at runtime before any HF URL is composed. PR 2 of a stacked 9-PR series. PR 1 (apache#5124) ships the supporting REST resource; PRs 3-5 will add image, audio + media-gen, and QA/ranking task families by registering new *Codegen objects in the dispatcher. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
What changes were proposed in this PR?
Introduces
HuggingFaceModelResource- a Jersey REST resource at/api/huggingface/*that backs the upcoming HuggingFace operator's model picker, audio upload, and media preview UI. Five endpoints:GET /api/huggingface/models?task=…[&search=…]GET /api/huggingface/tasksPOST /api/huggingface/upload-audio?filename=…GET /api/huggingface/audio-preview?path=…GET /api/huggingface/media-proxy?url=…Plus a single-line registration of the resource in
TexeraWebApplication.Architectural notes:
X-HF-Tokenrequest header (forwarded by the frontend from the operator's property panel in a follow-up PR). When absent, requests go to HF Hub anonymously. There is no server-side env-var token.Cache(size + TTL) for/modelsand/tasksresults. User-token requests bypass the cache to avoid serving one user's token-scoped list to another./upload-audioreadsInputStreamstraight to disk in 8 KB chunks with a 25 MiB cap (returns 413 on exceedance) - the request body is never buffered in memory. Extension allowlist rejects non-audio types up front./media-proxyrequires the URL's host to be in an allowlist (HF, fal.media, replicate.delivery/com) with a leading-dot suffix guard against lookalike domains./tasksuses a dedicatedForkJoinPool(4)for its per-task probe instead of the JVM's global common pool, with explicit 429/503 detection that logs at WARN.X-Texera-Truncated: trueheader when results were capped, so the frontend can show "list incomplete" hints.Any related issues, documentation, or discussions?
Tracked in #5134 & #5041(umbrella issue for the HuggingFace operator end-to-end implementation). This PR is the backend foundation; subsequent PRs will add the operator class, frontend property panel, result-panel media rendering, and developer documentation.
Closes #5134
How was this PR tested?
amber/src/test/scala/.../HuggingFaceModelResourceSpec.scala- 86 ScalaTest cases covering token sanitization, SSRF allowlist (including lookalike-domain rejection), JSON error escaping, MIME type inference, the audio-upload validation/size-cap/extension paths, audio-preview path validation and traversal rejection, media-proxy rejection paths, cache hit/bypass semantics, and the temp-dir sweep. Run withsbt 'WorkflowExecutionService/testOnly org.apache.texera.web.resource.HuggingFaceModelResourceSpec'- all 86 pass in ~6 seconds, no external network required.GET /api/huggingface/tasksreturns the expected JSON task list.GET /api/huggingface/models?task=text-generationreturns the paginated model list;text-generationshows theX-Texera-Truncated: trueheader whenMAX_PAGES=50is hit.POST /upload-audio?filename=evil.sh→ 400 (extension allowlist).POST /upload-audiowith a 30 MiB body → 413 (size cap).GET /media-proxy?url=http://localhost:8080/→ 403 (SSRF allowlist).Was this PR authored or co-authored using generative AI tooling?
Co-authored with Claude Opus 4.7 in compliance with ASF